االمتحان النهائي للفصل الدراسي الثاني للعام 2015/2014

Size: px

Start display at page:

Download "االمتحان النهائي للفصل الدراسي الثاني للعام 2015/2014"

Darcy Glenn
5 years ago
Views:

اسم الطبلب/ة رببعي: الرقم الجبمعي: عدد صفحبت االمححبن: اسم المدرس: عدد األسئلة: اسم المسبق: رقم المسبق: مدة االمححبن: وقث االمححبن: جبريخ االمححبن: تنقيب البيانات ساعتان 6 د.

1 اسم الطبلب/ة رببعي: الرقم الجبمعي: عدد صفحبت االمححبن: اسم المدرس: عدد األسئلة: اسم المسبق: رقم المسبق: مدة االمححبن: وقث االمححبن: جبريخ االمححبن: تنقيب البيانات ساعتان 6 د. محمد أحمد غزال 5 First Question: Assign True (T) for a correct statement and False (F) for a wrong statement: (10 points) 1. Domain Expertise is important for understanding the data, the problem and interpreting the results 2. Data cleaning concerns on cleaning only missing values and noisy data. ( ) 3. K-nearest neighbor approach is used to clear the inconsistent data. ( ) 4. Binning method is used to overcome noisy data ( ) 5. Outliers may be detected by classification, where values that fall outside the set of class values may be considered outliers ( ) 6. Inter-cluster distances (among the clusters) are maximized while Intra-cluster distances (among the cluster points) are minimized ( ) 7. Jaccard coefficient of dissimilarity d(i,j) is a nonnegative number that is close to 1 when the objects I and j are highly similar or near each other. ( ) 8. classification uses to predict whether the weather on a particular day will be sunny, rainy or cloudy. ( ) 9. Training set of data set used to build the classification model and test set used to validate it. ( ) 10. K-nearest neighbor algorithm is used for clustering the whole data set into set of clusters. ( ) Second Question: Choose the correct answer among the choices for each question (10 points) 1- Data mining is best described as the process of a. identifying patterns in data. b. deducing relationships in data. c. representing data. d. simulating trends in data. 2- Which the following activities is/are considered a data mining task. (a) Dividing the customers of a company according to their profitability. (b) Predicting the future stock price of a company using historical records. (c) Identifying the average salaries of Palestinians graduates in last decade. (d) Monitoring the heart rate of a patient for abnormalities. (e) b and d (f) All of above الصفحة 1 مه 6

2 3- This technique uses mean and standard deviation scores to transform real-valued attributes. a. decimal scaling b. min-max normalization c. z-score normalization d. logarithmic normalization 4- Data used to build a data mining model. a. validation data b. training data c. test data d. hidden data 5- A nearest neighbor approach is best used a. with large-sized datasets. b. when irrelevant attributes have been removed from the data. c. when a generalized model of the data is desireable. d. when an explanation of what has been found is of primary importance 6- Supervised learning differs from unsupervised clustering in that supervised learning requires a. at least one input attribute. b. input attributes to be categorical. c. at least one output attribute. d. ouput attriubutes to be categorical. 7- Which of the following is a valid production rule for the decision tree below? B u s in e s s A p p o in t- m e n t? No Y e s T e m p a b o v e 7 0? D e c is io n = w e a r s la c k s No Y e s D e c is io n = w e a r je a n s D e c is io n = w e a r s h o rts a. IF Business Appointment = No & Temp above 70 = No THEN Decision = wear slacks b. IF Business Appointment = Yes & Temp above 70 = Yes THEN Decision = wear shorts الصفحة 2 مه 6

3 c. IF Temp above 70 = No THEN Decision = wear shorts d. IF Business Appointment= No & Temp above 70 = No THEN Decision = wear jeans Use these tables to answer questions 8 and 9. Single Item Sets Number of Items Magazine Promo = Yes 7 Watch Promo = No 6 Life Ins Promo = Yes 5 Life Ins Promo = No 5 Card Insurance = No 8 Sex = Male 6 Two Item Sets Number of Items Magazine Promo = Yes & Watch Promo = No 4 Magazine Promo = Yes & Life Ins Promo = Yes 5 Magazine Promo = Yes & Card Insurance = No 5 Watch Promo = No & Card Insurance = No 5 8- One two-item set rule that can be generated from the tables above is: If Magazine Promo = Yes Then Life Ins promo = Yes The confidence for this rule is: a. 5 / 7 b. 5 / 12 c. 7 / 12 d Based on the two-item set table, which of the following is not a possible two-item set rule? a. IF Life Ins Promo = Yes THEN Magazine Promo = Yes b. IF Watch Promo = No THEN Magazine Promo = Yes الصفحة 3 مه 6

4 c. IF Card Insurance = No THEN Magazine Promo = Yes d. IF Life Ins Promo = No THEN Card Insurance = No 10- Supervised learning and unsupervised clustering both require at least one a. hidden attribute. b. output attribute. c. input attribute. d. categorical attribute. Third Question: Answer the following questions: Total Points = (13) Question 3.1: Construct a decision tree with root node Type from the data in the table below. The first row contains attribute names. Each row after the first represents the values for one data instance. The output attribute is Class. (6 points) Question 3.2: Extract classification rules from the decision tree. (2 points) Scale Type Shade Texture Class One One Light Thin A Two One Light Thin A Two Two Light Thin B Two Two Dark Thin B Two One Dark Thin C One One Dark C Thin One Two Light Thin C Question 3.3: Binning the following data set using a supervised discretization. (5 points) Temperature Play? Y Y N Y N Y N Y Y N N الصفحة 4 مه 6

5 Question 4: Answer the following questions Total points = (12) Question 4.1: Explain the differences between k-means and k-mediods. (2 points) Question 4.2: Use k-means algorithm to cluster the following eight points (with (x, y) representing locations). Supposing that k=3. (5 points) - A1(2, 10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9). Question 4.3: Given the following data set for users web access. Cluster it using hierarchical clustering algorithm: (5 points) Site 1 Site 2 Site 3 Site 4 Site 5 User User User User User الصفحة 5 مه 6

6 Question 5: Use Naïve Bayes classifier to classify new case based on the data set of car theft. (5 points) Color Type Origin Stolen Red SUV Domestic?? Where: the data set (car theft) has three main attributes color, type and origin. Calss stolen can be either yes or no. Example # Color Type Origin Stolen 1 Red Sport Domestic Yes 2 Red Sport Domestic No 3 Red Sport Domestic Yes 4 Yellow Sport Domestic No 5 Yellow Sport Imported Yes 6 Yellow SUV Imported No 7 Yellow SUV Imported Yes 8 Yellow SUV Domestic No 9 Red SUV Imported No 10 Red Sport Imported Yes Dr. Mohammed Ahmed Ghazl End Questions,,, With my best wishes الصفحة 6 مه 6

Applying Supervised Learning

Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains